Discriminator-Enhanced Knowledge-Distillation Networks

نویسندگان

چکیده

Query auto-completion (QAC) serves as a critical functionality in contemporary textual search systems by generating real-time query completion suggestions based on user’s input prefix. Despite the prevalent use of language models (LMs) QAC candidate generation, LM-based approaches frequently suffer from overcorrection issues during pair-wise loss training and efficiency deficiencies. To address these challenges, this paper presents novel framework—discriminator-enhanced knowledge distillation (Dis-KD)—for task. This framework combines three core components: large-scale pre-trained teacher model, lightweight student discriminator for adversarial learning. Specifically, aids discerning generative-level differences between models. An additional score is amalgamated with traditional knowledge-distillation loss, resulting enhanced performance model. Contrary to stepwise evaluation each generated word, our approach assesses entire generation sequence. method alleviates issue process. Consequently, proposed boasts improvements model accuracy reduction parameter size. Empirical results highlight superiority Dis-KD over established baseline methods, surpassing tasks sub-word languages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Entrainer-enhanced Reactive Distillation

The paper presents the use of a Mass Separation Agent (entrainer) in reactive distillation processes. This can help to overcome limitations due to distillation boundaries, and in the same time increase the degrees of freedom in design. The catalytic esterification of fatty acids with light alcohols C2-C4 is studied as application example. Because the alcohol and water distillate in top simultan...

متن کامل

Data-Free Knowledge Distillation for Deep Neural Networks

Recent advances in model compression have provided procedures for compressing large neural networks to a fraction of their original size while retaining most if not all of their accuracy. However, all of these approaches rely on access to the original training set, which might not always be possible if the network to be compressed was trained on a very large dataset, or on a dataset whose relea...

متن کامل

Learning Loss for Knowledge Distillation with Conditional Adversarial Networks

There is an increasing interest on accelerating neural networks for real-time applications. We study the studentteacher strategy, in which a small and fast student network is trained with the auxiliary information provided by a large and accurate teacher network. We use conditional adversarial networks to learn the loss function to transfer knowledge from teacher to student. The proposed method...

متن کامل

Sequence-Level Knowledge Distillation

Neural machine translation (NMT) offers a novel alternative formulation of translation that is potentially simpler than statistical approaches. However to reach competitive performance, NMT models need to be exceedingly large. In this paper we consider applying knowledge distillation approaches (Bucila et al., 2006; Hinton et al., 2015) that have proven successful for reducing the size of neura...

متن کامل

Topic Distillation with Knowledge Agents

This is the second year that our group participates in TREC’s Web track. Our experiments focused on the Topic distillation task. Our main goal was to experiment with the Knowledge Agent (KA) technology [1], previously developed at our Lab, for this particular task. The knowledge agent approach was designed to enhance Web search results by utilizing domain knowledge. We first describe the generi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied sciences

سال: 2023

ISSN: ['2076-3417']

DOI: https://doi.org/10.3390/app13148041